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ABSTRACT 

We study the ability of PINOCCHIO (PINpointing Orbit-Crossing Collapsed Hier- 
archical Objects) to predict the merging histories of dark matter (DM) haloes, com- 
paring the PINOCCHIO predictions with the results of two large N-body simulations 
run from the same set of initial conditions. We focus our attention on quantities most 
relevant to galaxy formation and large-scale structure studies. PINOCCHIO is able to 
predict the statistics of merger trees with a typical accuracy of 20 per cent. Its validity 
extends to higher-order moments of the distribution of progenitors. The agreement is 
valid also at the object-by-object level, with 70-90 per cent of the progenitors cleanly 
recognised when the parent halo is cleanly recognised itself. Predictions are presented 
also for quantities that are usually not reproduced by semi-analytic codes, such as the 
two-point correlation function of the progenitors of massive haloes and the distribu- 
tion of initial orbital parameters of merging haloes. For the accuracy of the prediction 
and for the facility with which merger histories are produced, PINOCCHIO provides 
a means to generate catalogues of DM haloes which is extremely competitive to large- 
scale N-body simulations, making it a suitable tool for galaxy formation and large-scale 
structure studies. 

Key words: galaxies: haloes - galaxies: formation - galaxies: clustering - cosmology: 
theory - dark matter 



1 INTRODUCTION 

In the Hierarchical Clustering Scenario, structure in the Uni- 
verse forms from the aggregation and merging of smaller 
subunits. This theoretical picture is now substantiated, at 
least on a qualitative level, by a wealth of observations of the 
high-redshift (z ~ 3 — 5) Universe. In the most commonly 
discussed scenario, the 'Cold Dark Matter' (CDM) one, hi- 
erarchical clustering is driven by the gravitational collapse 
of DM fluctuations, while the visible astrophysical objects 
are generated from baryons falling into the DM haloes (see, 
e.g., White & Rees 1978). Thus the process of formation and 
evolution of DM haloes is of fundamental importance for un- 
derstanding the properties of galaxies or galaxy clusters. 

The formation of DM haloes involves highly non-linear 
dynamical processes which can not be followed analytically. 
To face this problem it is necessary to resort to numerical 
N-body simulations. Besides this time-consuming method, 
one can use also analytical approximations that are able 
to predict with fair accuracy some relevant quantities re- 
lated to the assembly of DM haloes. Moreover, the analytic 
methods help to shead light on the complex gravitational 



problem of hierarchical clustering. The pioneers of the an- 
alytical approach were Press & Schechter (1974; hereafter 
PS) who derived an expression for the mass function of DM 
haloes. This was found to give a fair approximation of the N- 
body results (Efstathiou et al. 1988; see for a review Monaco 
1998). The PS approach was extended by Bond et al. (1991; 
see also Peacock & Heavens 1990; Bower 1991; Lacey & 
Cole 1993) (Extended PS formalism, hereafter EPS), who 
fixed a normalization problem of the original PS work. The 
EPS model can be used to predict also some properties of 
DM haloes, such as their formation time, survival time and 
merger rate. These predictions were tested against numeri- 
cal simulations, again with success, by Lacey & Cole (1994). 
The EPS formalism has recently become a standard tool to 
construct synthetic catalogues of DM haloes for galaxy for- 
mation programs (see, e.g., Kauffmann, White & Guiderdoni 
1993; Somerville & Primack 1999; Cole et al. 2000). 

However, recent work with larger N-body simulations 
has revealed significant discrepancies between PS and EPS 
predictions and numerical results. The PS mass function 
has been shown to underpredict the number of massive 
haloes and over-predict the number of low-mass ones (Gelb 



© 0000 RAS 



2 Taffoni, Monaco & Theuns 



& Bertschinger 1994, Governato et al. 1999; Jenkins et al. 
2001; Bode et al. 2001). Similar discrepancies were observed 
in the reconstruction of the conditional mass function, i.e. 
the number density of haloes bound to flow into a parent 
halo of given mass at a subsequent time * (Somerville & 
Kolatt 1999; Sheth & Lemson 1999b). The EPS formalism 
is also affected by limitations, in that it does not give full 
information on the spatial distribution of haloes (Catelan et 
al. 1998; Jing 1998, 1999; Porciani, Catelan & Lacey 1999), 
and by inconsistencies in the use of smoothing filters and in 
the construction of merger trees (Somerville & Kolatt 1999; 
Sheth & Lemson 1999b; Cole et al. 2001). Attempts to im- 
prove this formalism, or to develop alternative ones, were 
reviewed by Monaco (1998). A more recent and successful 
extension is due to Sheth & Tormen (1999) and Sheth, Mo 
& Tormen (2001); their model improves significantly the fit 
of the mass function and the extension of dynamics to el- 
lipsoidal collapse (EPS is based on linear theory), but does 
not remove the inconsistencies of the EPS approach and 
does not provide spatial information of haloes either. This 
method has also been applied to build random realizations of 
the merging histories of DM haloes (Sheth & Tormen 2001), 
but it does not provide a significant improvement respect to 
the standard merger trees. 

Recently, we have presented a new algorithm, called 
PINOCCHIO (PINpointing Orbit-Crossing Collapsed Hier- 
archical Objects), to generate synthetic catalogues of DM 
haloes with known mass, position, peculiar velocity, merger 
history and angular momentum (Monaco et al. 2001, here- 
after paper I; Monaco, Theuns & Taffoni 2001, hereafter 
paper II). In contrast to EPS, PINOCCHIO is able both 
to reproduce statistical quantities, such as the mass or two- 
point correlation function of haloes, and to reproduce haloes 
on an point-by-point basis. 

In this paper, we investigate in detail how well PINOC- 
CHIO is able to recover the merger histories (or merger 
trees) of DM haloes. We compare the PINOCCHIO code 
with numerical N-body simulations and with the analyti- 
cal extimates of the EPS theory. We examine the ability of 
PINOCCHIO to recostruct the main statistical properties of 
the merger trees, extending the analysis to predict the corre- 
lation function and the initial orbital parameters of merging 
haloes. 

Section 2 gives a brief description of the PINOCCHIO 
code with special attention to the extraction of the merger 
trees. In Section 3 we compare the statistical properties of 
the distribution of DM haloes at different redshifts given 
by PINOCCHIO with the results of numerical N-body sim- 
ulations. Section 4 is dedicated to the study of the spatial 
distribution of the haloes that will form cluster sized objects 
at the present time. Section 5 shows the ability of PINOC- 
CHIO to predict the impact parameters of merging haloes. 
The conclusions are reported in Section 6. 



* In the following, the 'final' haloes at z = (or occasionally 
at higher rcdshift) will be called parent, while the higher-redshift 
haloes that flow into the parent will be called progenitors. 



2 MERGER TREES FROM PINOCCHIO 

The PINOCCHIO code was presented in paper I and de- 
scribed in full detail in paper II. Here we give only a brief 
description of the code, necessary to discuss the procedure 
used to extract the merger trees. 

For a given cosmological background model and a power 
spectrum of fluctuations, a Gaussian linear density contrast 
field S\ (i.e. linearly extrapolated to z = 0) is generated on a 
cubic grid, in a way much similar to what is usually done to 
generate initial condition for N-body simulations. The lin- 
ear density contrast 8\ is smoothed repeatedly with Gaussian 
filters of FWHM R, where R takes values that are equally 
spaced (~20 smoothing radii usually give an adequate sam- 
pling) . For each point q of the Lagrangian (initial) coordi- 
nate and for each smoothing radius R, the collapse time (i.e. 
the time at which the particle is predicted to enter a high- 
density, multi-stream region) is computed using Lagrangian 
Perturbation Theory (hereafter LPT; see e.g. Bouchet 1996, 
Buchert 1996, Catelan 1995) and its ellipsoidal truncation 
(Monaco 1997). Technically, the collapse time is defined as 
the instant of Orbit Crossing (OC); this definition is dis- 
cussed at length in paper II (see also Monaco 1995, 1997a). 
For each particle only the earliest collapse time is recorded 
which amounts to recording the field 

F " (q)s T(bfe)- (1) 

Here b(t) is the linear growing mode (see Padmanabhan 
1993; Monaco 1998) and t c is the OC-collapse time. Notice 
that in an Einstein-de Sitter Universe F max = (1 + z max ), 
where z max is the largest collapse redshift at which the par- 
ticle collapses.^ 

Besides F max , we record also the smoothing radius i? max 
for which the maximum in equation (1) is reached, and the 
velocity of the particle at collapse time as given by the 
Zel'dovich approximation (1970) (which is the first-order 
term of LPT), 

v max = -6(£)V0(q; # max ) (2) 

(in units of comoving displacement), where </>(q; i? max ) is 
the rescaled peculiar gravitational potential (smoothed at 
fl max ), which obeys the Poisson equation V 2 0(q) = 5\. All 
differentiations and convolutions are performed using fast 
Fourier transforms. 

The collapsed medium is then "fragmented" into iso- 
lated objects using an algorithm designed to mimic the ac- 
cretion and merger events of hierarchical collapse. Collapsed 
particles may belong to relaxed haloes or to lower-density 
filaments. At the instant a particle is deemed to collapse, 
we decide which halo, if any, it accreted onto. The candi- 
dates haloes are those that already contain one Lagrangian 
neighbour of the particle (on the initial grid q of Lagrangian 
positions, the six particles nearest to a given one are its 
"Lagrangian neighbours"). The particle will accrete onto 
the halo if its distance (in the Eulerian space at the col- 
lapse time) from the centre of mass of the candidate halo is 

t Taking the largest redshift (or F- value) of collapse is analogous 
to considering the largest collapse radius, as done in the EPS 
formalism. 
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Figure 1. Conditional mass functions in the ACDM case for parent haloes identified at Z. = 0. The mass threshold is fixed at Af t h = 
7.6 X f0 10 M (fO particles), the redshift increases from left to right and covers the values: 2 = 1, 2, 4. The mass of the parent 
halo increases from top to bottom, the adopted values are: Mq = 5. X 10 12 AfQ, 3. X 1O 13 M0 and 2.0 X 1O 14 M0. The points represent 
the simulation data while the solid lines are the prediction of PINOCCHIO; the dashed lines are the analytical predictions of the EPS 
formalism. 



smaller than a given fraction of the halo size (Tijv = N 1 ^ 3 in 
grid unit, where N is the number of particles in the halo), 
otherwise they are catalogued as filaments. If a particle has 
more than one candidate halo, we check whether these haloes 
should merge. The merging condition is very similar to the 
accretion one: two groups merge if their distance in the La- 
grangian space is smaller than a fraction of the size of the 
largest halo. If a particle is a local maximum of z ma x it is 
considered as the seed of a new halo. Finally, filament par- 
ticles are accreted onto a halo when they neighbour in the 
Lagrangian space an accreting particle; this is done to mimic 
accretion of filaments onto haloes. 

The fragmentation algorithm requires the introduction 
of free parameters, which are analogous to those required by 
any clump- finding algorithm applied to N-body simulations. 
These parameters specify the level of overdensity at which 



the halo is denned, or are introduced to fix resolution effects. 
They are discussed in paper II (to which we refer for all de- 
tails) and chosen by requiring the fit of the mass function of 
haloes selected with the friends-of-friends (FOF) algorithm 
with linking length 0.2 times the mean inter-particle dis- 
tance at z = 0. Here we use for the parameters the values 
found in paper II. 

To give a taste of the speed of PINOCCHIO with re- 
spect to simulations, a 256 3 particles realization needs about 
6 hours on a Pentium III 450MHz computer, with a RAM 
requirement of about 512 Mb, the correspondig simulation 
required a week on 10 processor parallel computer. Statisti- 
cal quantities like the mass function or the two-point correla- 
tion function are reproduced with a typical error of ~10 per 
cent or smaller. At the object-by-object level, the accuracy 
of PINOCCHIO depends on the degree of non-linearity that 
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Figure 2. Same as in Fig. 1 but for the SCDM case. The mass threshold is M th = 1.49 X 1O 13 M (10 particles). 



is reached at the grid level, and degrades in time. Typically, 
70-99 per cent of the objects are reproduced with an error 
on the mass of 30-40 per cent, an error on the position of 
~ 0.5-2 grid points and a ID error on the velocity of ~ 150 
km/s at z — 0. 

It is noteworthy that merging events in PINOCCHIO 
are not restricted to be binary: in principle up to six objects 
can merge together at the same time, even if, as expected, 
the number of mergers that involves more than three haloes 
is a very small fraction of the total. 

The merger histories of haloes are directly evaluated by 
PINOCCHIO. At each merger the largest halo retains its 
identification number (ID) which will become the ID of the 
merger, while the other haloes are labelled as expired. The 
mass of each halo involved in the merging event is recorded 
together with the redshift at which the merger takes place. 
For each expired (progenitor) halo we keep track at all times 
of the (parent) halo they are presently incorporated within. 
Even though accretion is rigorously denned as the entrance 
of a single particle into the object, the merger of a halo with 
another one with less than 10 particles is always considered 
as an accretion event. 

The merger trees extracted from PINOCCHIO provide 
a more complete description of the merging histories of 
haloes then the EPS one. They not only follow the time 
evolution of the mass and number distribution of the pro- 
genitors, but also their distribution in space, their velocities 
and angular momenta. 



3 STATISTICS OF THE PROGENITORS 
3.1 The simulations 

In order to test the ability of PINOCCHIO in predicting the 
statistics of the merger trees, we compared the results of two 
N-body simulations with those of PINOCCHIO applied to 
the same initial density field. The simulations were already 
presented in paper I and II, to which we refer for all the 
details. They are a standard CDM model (SCDM), run with 
the PKDGRAV code on a large box of 500 ft" 1 Mpct with 
360 3 particles (Governato et al. 1999), and a ACDM model, 
run with the Hydra code (Couchman, Thomas & Pearce 
1993) on a smaller box of 100 hT x Mpc with 256 3 particles. 
The reason why we use two different simulations is to check 
the method for different cosmologies, boxes, resolutions and 
codes. For the present purpose, the ACDM simulation is 
more suitable as the higher mass resolution allows one to 
reconstruct the merger tree to higher redshifts and lower 
masses, but the SCDM allows one to test the merger trees 
for the more massive haloes. 

The haloes are identified using a standard FOF algo- 
rithm with linking length 0.2 times the inter-particle dis- 
tance. Note that, following the suggestion by Jenkins et al. 
(2001), we do not change linking length with the cosmol- 
ogy. In this paper, we adopt 10 particles as the minimum 
mass of the haloes when we analyse the conditional mass 

t The Hubble constant is assumed to be Hq = 100 h km s _1 
Mpc" 1 . 
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Table 1 . Parameters of the considered numerical simulations 



Cosmology Qo 



ho 



717, 



part 



output redshifts 



ACDM 
SCDM 



0.3 
1. 



0.7 
0. 



0.65 
0.5 



0.9 
1. 



1. 
1. 



lOOMpc/h 
500Mpc/h 



256 3 
360 3 



7.64 x 10 10 M Q 
1.49 x 10 12 M 



, 0.25, 0.5, 0.75, 1, 2, 3, 
0, 0.43, 1.13, 1.86 



4, 5 



function. This is to test the effect of the degrading of the 
agreement at small masses. In general, at least 30 particles 
are necessary to identify reliably a halo both in the simula- 
tions and in PINOCCHIO, so we consider a threshold mass 
of 30 particles for the other statistical analysis. 

The merger trees for the FOF haloes at final time zo 
are constructed as follows. Progenitors are defined as those 
haloes that at the higher redshift z contain some of the par- 
ticles of the parent halo at zq. As noted by some authors 
(see e.g. Somerville et al. 2000), some particles that are lo- 
cated in a progenitor are not included later into the parent. 
This reflects the actual dynamics of the haloes that suffer 
stripping and evaporation events, and make the progenitor 
identification process more ambiguous. We then adopt two 
simple rules: 

(i) if a parent halo contains less than 90 per cent of the 
mass of all its progenitors at redshift z, then it is excluded 
from the analysis (this happens in a few percent of cases); 

(ii) we assign to the progenitor the mass of all its particles 
that will flow in to the parent at zq. 

In this way we force mass conservation in the merger 
tree and reject some extreme cases when the progenitor is 
strongly affected by these 'evaporation' effects. 

Due to the limited number of available outputs, the 
merger trees obtained from our simulations are very coarse- 
grained in time. This highlights one of the advantages of 
using a code like PINOCCHIO to produce the merger trees. 
In fact, in PINOCCHIO we follow the merging of haloes in 
real time, and then we can link each progenitor to its parent 
after each merging event, while in the simulations (where 
haloes are identified after the run) it is necessary to anal- 
yse and cross-correlate a large number of outputs to follow 
the merger histories. In other words, the generation of the 
merger trees is by far less expensive (in term of CPU time, 
disk space and human labour) in PINOCCHIO than in a 
simulation; in fact PINOCCHIO automatically compute the 
merging history of haloes and it does not need any further 
analysis. 



3.2 Progenitor mass function 

The progenitor (conditional) mass function 
dN(M, z\Mo, zo)/dM is the number density of progenitors 
of mass M at redshift z that merge to form the parent Mo 
at redshift zo- An estimate of this quantity based on the 
PS formalism was found by Bower (1991), while Bond et al. 
(1991) gave the basis for computing it in the EPS formalism 
(as done by Lacey & Cole 1993). 

Let be S(z) the critical density for a spherical pertur- 
bation to collapse at redshift z and A(M) = a 2 (M) the 
variance of the initial density field when smoothed over re- 
gions that contain on average a mass M. The fraction of 
mass of the parent halo that was in the progenitors of mass 
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Figure 3. The distribution of the mass of the largest progenitor 
Mi for the ACDM case with mass threshold M th = 2.3 X 1O 11 M 
(30 particles). The histograms are the PINOCCHIO predictions 
and the points connected with solid lines are the simulations'. The 
quantity plotted on the upper part of each box is the mean of the 
distribution of the mass ratio of the second largest progenitor 
M2 to the first largest progenitor Mi versus the mass ratio of 
the largest progenitor to the parent halo. The solid line is the 
PINOCCHIO result and the dashed lines show its lc variance. 
The points with error bar are the simulation data. 



M at early time is: 
f(M,S\M ,5 )dM 



(5 - S ) 



x exp < — 



2^ [A(M) - 
(S - So) 2 



2[A(M) - A(Af )] 
and the conditional mass function is: 



A(Mo)] 3 / 2 



dA 



(3) 



f(M,S\M ,S ) dM . (4) 



— (M,z\Mo,zo)dM=( w ) 

The PINOCCHIO conditional mass function and that ob- 
tained from the simulations are computed by averaging over 
a mass interval around log Mo of 0.01 dex. 

In Fig. 1 and Fig. 2 we compare the conditional mass 
functions obtained from the EPS formalism, PINOCCHIO 
and the simulations for the ACDM and the SCDM case re- 
spectively. The bottom panels of Fig. 1 show for the ACDM 
case the results for a cluster-sized parent of Mo — 2 x 10 14 
Mq, the case of haloes corresponding to small groups (Mo = 
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Figure 4. As Fig. 3 but for the SCDM case. The mass threshold 
is M th = 1.3 X 10 14 M Q (30 particles). 

3 x 10 13 M Q ) and galaxies (Mo = 5 x 10 12 Mq) are pre- 
sented in the mid and upper panels. On Fig. 2 we show the 
results for parents with mass comparable to massive clusters 
(M = 1 x 10 15 M and M = 5 x 10 15 Mq) extracted from 
the SCDM simulation. The dotted lines show the EPS an- 
alytical prediction and the points show the expected value 
computed from the simulations. The the Poissonian errors 
associated to the simulation data are of the same width of 
the simbols used to plot the simulation data. 

The conditional mass function predicted using PINOC- 
CHIO (the solid lines in the plots) shows a very good agree- 
ment when compared with the simulations. In Fig. 1 and 
Fig.2 we show that the PINOCCHIO prediction fits the sim- 
ulations data with similar accuracy for all the considered 
parent mass and redshifts and we identify a discrepancy be- 
tween the two distribution which in general is less than 25 
per cent. This means that PINOCCHIO reproduces the con- 
ditional mass function with better accuracy that the EPS 
prediction and almost constant in mass and redshift. 

On the other hand the figures show a discrepancy al- 
ready pointed out by other authors for the mass function 
of haloes (Gelb & Bertschinger 1994; Governato et al. 1999; 
Jenkins et al. 2001; Bode et al. 2001 ): the EPS prediction 
overestimates the number of low mass progenitors and un- 
derestimates the number of high mass progenitors. This dis- 
crepancy is less evident at high redshift and it ranges from 
30 per cent to a factor of 2 or more depending on the mass 
of the parent halo. 

3.3 Higher-order analysis of the progenitor 
distribution 

We evaluate the distribution of the mass of the largest pro- 
genitor Mi (i.e. the most massive halo that flows into the 
parent) for each of the parent haloes analysed before. The 
histograms on Fig. 3 and Fig. 4 show the distribution of 
the mass of the larger progenitor normalized to the parent 
mass, Mi /M , predicted by PINOCCHIO for the ACDM 
and SCDM case (in the following the mass threshold is al- 
ways set to 30 particles). The symbols connected with lines 
denote the corresponding simulation results. The agreement 



between the numerical experiment and PINOCCHIO is very 
good. Both the mean value and the width of the distribution 
are reproduced with good accuracy at all redshifts. 

The distribution of Mi /Mo provides also a hint on the 
formation time of the parent. In fact, the standard defini- 
tion of formation time for a halo of mass Mo is the epoch at 
which the size of its largest progenitor first becomes greater 
than Mo/2. So we assume as the average formation redshift 
for a parent halo of mass Mo the time at which the peak of 
the distribution Mi /Mo is at one half. The good agreement 
of PINOCCHIO with the simulations can thus be extended 
also to the halo formation times. For instance Fig. 4 sug- 
gests that, in this SCDM cosmology, a halo of 1 x 10 15 M Q 
forms at z ~ 0.43 or later. Notice that a more detailed anal- 
ysis of formation times is hampered by the small number of 
simulation outputs available. 

In the upper part of the plots of Fig. 3 and Fig. 4 the dis- 
tribution of M2/M1 (the ratio of the second largest progeni- 
tor and largest ones) given Mi /Mo is shown. The points are 
the mean value of the distribution and the error bars are the 
the corresponding la variance, both measured in the sim- 
ulations. The solid lines and the dashed lines are the same 
quantities predicted by PINOCCHIO. Again the agreement 
is very good. 

The results reported in this section and in the previous 
one suggest that the merging histories of haloes produced by 
PINOCCHIO reproduce with very good accuracy the statis- 
tical properties of the masses of those extracted from numer- 
ical simulations. We notice that the EPS based algorithms 
to produce merger trees (Lacey & Coles 1994, Somerville & 
Kolatt 1999, Sheth & Lemson 1999, Coles et al. 2000) are by 
construction forced to reproduce the EPS analytical distri- 
butions, and they suffer of the same discrepancy noted for 
the EPS analytical prediction (Fig. 1 and 2). 

3.4 The progenitors in number 

In this section we analyse the statistical properties of the 
distribution of the number of progenitors of a halo of mass 
Mo. 

In Fig. 5 and Fig. 6 we show the probability P(N, Mo) 
that a halo of mass Mo has N progenitors. The average 
of these distribution gives (with suitable normalisation) the 
integral of the conditional mass function to the threshold 
mass, and is dominated by the more numerous small-mass 
objects. 

The histograms show the distribution of the number of 
progenitors evaluated from PINOCCHIO for different par- 
ent masses and redshifts. The filled symbols connected with 
lines are the distribution extracted from the simulation. We 
notice that PINOCCHIO reproduces fairly well the distribu- 
tions also for the more massive haloes and at all redshifts. 
The ability of PINOCCHIO in predicting the distribution 
of the number of progenitors can be quantified by compar- 
ing the first and second moments measured in the simula- 
tions with their values predicted by PINOCCHIO. In fig 7 
we show the average /ii and the rescaled variance 112/^1 as 
a function of the parent halo mass for different redshifts. 
The points connected with solid lines are the PINOCCHIO 
prediction and the open symbols are the values measured 
form the simulations. The dashed lines are the EPS analyt- 
ical prediction for /ji computed by integrating equation (4). 
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Figure 5. Probability that an halo Mo at z = has N progenitors for the ACDM case. The threshold mass is M t h = 2.3 X 10 11 Mq (30 
particles). The points connected with solid lines represent the simulation data while the histograms are the prediction of PINOCCHIO. 
The vertical lines are the EPS analytical prediction for the mean of the distribution. 
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Figure 6. Same as in Fig. 5 for the SCDM case. The threshold mass is M th = 1.3 X 1O 14 M (30 particles). 
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Figure 7. The first two moments of the distribution of the num- 
ber of progenitors P(N, Mo) as a function of the parent mass 
Mo. The left plots show the ACDM case at redshift 2 = 1 and 
2. The threshold mass is 

Af t h = 2.3 x 10 11 Af Q (30 particles) and 
we plot the mean (squares) and the rescaled variance (circles) up 
to Mo = 1000 M t h . The solid lines with open symbols are the 
PINOCCHIO results and the filled symbols are the simulation 
data. The dashed line is the EPS analytical prediction for the 
mean. The right plots show the SCDM case at redshift z=0.43 
and z=1.13. The threshold mass is M th = 1.3 X 10 14 M Q (30 par- 
ticles), and we plot the mean and the rescaled variance up to 
M =50M th . 



Note that for arbitrary initial conditions the EPS formal- 
ism cannot analytically evaluate the higher moments of the 
distribution. 

The agreement between PINOCCHIO and the simula- 
tions varies from the 5 per cent of the ACDM to the 10 per 
cent of the SCDM case but it does not depend on the red- 
shift. Again PINOCCHIO is found to improve with respect 
to EPS. In particular, at low redshift the EPS predictions 
underestimate the mean value by a factor that ranges from 
20 per cent to 30 per cent. 

Our results can be compared to those shown by Sheth & 
Lemson (1999b) and Somerville et al. (2000) for EPS based 
merger trees and with Sheth & Tormen (2001) who elab- 
orate an excursion set model based on ellipsoidal collapse. 
In general, PINOCCHIO reproduces the statistical proper- 
ties of progenitor distributions with better accuracy then 
the other methods. It is remarkable that the tests based on 
parent haloes with different mass ranges give very similar 
results, reproducing the simulations with a comparable ac- 
curacy. 




11 12 13 14 05 1 
Log M [MJ M / M o 

Figure 8. Fraction / p of cleanly assigned progenitors for redshift 
z = 1, 2 and 4. The left hand side represents the fraction of 
cleanly assigned progenitors as function of the parent mass. The 
three lines correspond to the different redshifts. The tree plots on 
the right are the scatter plots of / p as function of the progenitor 
mass M normalized to the parent mass Mo, for the parent haloes 
of mass 10 11 Mq < Mq < 10 15 Mq; the redshift increases from 
top to bottom. 



3.5 Object-by-object comparison 

We finally test the degree of agreement between PINOC- 
CHIO and the simulations at the object-by-object level for 
the number of progenitors that are cleanly reconstructed. In 
paper I and paper II a pair of haloes coming from the two 
catalogues (PINOCCHIO and FOF) were defined as cleanly 
assigned to each other if they overlapped in the Lagrangian 
space for at least 30 per cent of their volume and no other 
object overlapped with either of them to a higher degree. 
The cleanly assigned haloes were shown to overlap on aver- 
age at the 60-70 per cent over at all redshifts. The fraction 
of cleanly assigned haloes was found to depend on the de- 
gree of non linearity reached by the system, decreasing from 
almost 100 per cent to 70 per cent, at worst, at later times. 
This is due to the lower accuracy of the Zel'dovich approx- 
imation in predicting the displacements as the density field 
becomes more and more non-linear (see paper II). 

We now quantify the number of PINOCCHIO progen- 
itors that are cleanly assigned to FOF progenitors for each 
cleanly assigned parent halo. For this analysis we restict our- 
selves to the ACDM case, which gives a wider mass range 
but a higher level of non-linearity. 

In Fig. 8 we show, for the parents that are cleanly iden- 
tified, the fraction in number f p of the progenitors that are 
cleanly identified as well. This quantity is shown both as a 
function of the parent mass Mo and as a function of the pro- 
genitor mass M /Mq in units of the parent mass. The number 
of cleanly identified progenitors ranges from 60 to 100 per 
cent, with an average value between 80 and 90 per cent. The 
fraction / p is in general higher at higher redshift, when the 
object-by-object agreement between PINOCCHIO and the 
simulation is better. As a function of Mo larger parent haloes 
tend to be reconstructed with worse accuracy, especially at 
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Figure 9. The Correlation functions in the ACDM case for progenitors of mass greater than 1O 14 M0 at redshift (circles and solid 
lines), compared with the the correlation function for all the haloes with mass greater than the threshold mass Mt r = 2.31O 11 M0 
(triangles and dashed lines). Points refers to FOF selected haloes from the simulation and lines to PINOCCHIO haloes. The second row 
of plots shows the ratio between the progenitors and total correlation functions. The redshift increases from left to right and covers the 
values z = 1.0, 2.0, 4.0. 



z — 1. This is mainly due to the small progenitors, as the 
right panels of Fig. 8 show. The progenitors which carry 
a mass of less that ~20 per cent are those that are worst 
reconstructed. We conclude that PINOCCHIO is able to re- 
construct correctly the main branches of the merger trees, 
while secondary branches, especially present in the larger 
haloes, are reconstructed in a noisier way. 



4 THE SPATIAL PROPERTIES OF MERGING 
HALOES 

One remarkable limit of EPS is the lack of spatial infor- 
mation for the haloes. Several authors (Mo & White 1996; 
Mo, Jing & White 1997; Catelan et al. 1998; Porciani et 
al. 1998) found approximate analytical expressions for the 
bias of haloes of fixed mass, i.e. for the ratio between the 
two-point correlation function of haloes and that of the un- 
derlying matter field. Such analytical estimates have been 
found to agree with the results of simulations to within ~40 
per cent (Mo & White 1996; Jing 1998; Porciani, Catelan 
& Lacey 1999; Sheth, Mo & Tormen 2000; Colberg et al. 
2001). In this approach it is not possible to know how the 
bias changes for haloes with different merger histories. This 
piece of information is precious to produce predictions on 



the bias of galaxies of different types, that typically have 
different merger histories. 

As shown in paper I and II, PINOCCHIO haloes have 
the same correlation lenght ro as FOF haloes to within 10 
per cent error. Having knowledge of both merger histories 
and halo positions, PINOCCHIO can provide information 
on the relation between clustering and merging. To show 
this, we select PINOCCHIO and FOF haloes in the ACDM 
cosmology at z — with masses greater than 10 14 Mq. 
We check their merging histories at z = 1, 2 and 4, and 
we evaluate the the two-point correlation functions for their 
progenitors. In Figure 9 the solid lines represent the two- 
point correlation function of progenitors, £ p (r), evaluated in 
PINOCCHIO compared with the same quantity measured 
in the simulation. The plots show that PINOCCHIO repro- 
duces such correlation functions to within ~20 per cent er- 
ror. 

We also compare this function with the average corre- 
lation function, ^h(f), at the same redshifts. It is apparent 
that PINOCCHIO reproduces correctly the larger cluster- 
ing amplitude of haloes that flow into cluster sized one. 
The bias between the two halo populations is defined as: 
b 2 (r,z) = £ P (r)/£h(r)- We compare in the bottom row of 
plots in Fig. 9 the bias measured in the simulation with 
PINOCCHIO results. The bias is recovered to within ~20 
per cent and the scale dependence is correctly reproduced. 



© 0000 RAS, MNRAS 000, 1-12 



10 Taffoni, Monaco & Theuns 



5 ORBITAL PARAMETERS OF THE 
MERGING HALOES 

In the hierarchical clustering scenario a merging event be- 
tween two or more haloes corresponds to the loss of identity 
of the single primitive units which merge to form a new halo. 
However, high-resolution N-body simulations show that the 
dynamical evolution after an encounter is more complicated 
than this idealized picture: the haloes may retain their iden- 
tity, and become substructure of the new system (Moore, 
Katz & Lake 1996; Tormen 1997; Ghigna et al. 1998; Tor- 
men, Diaferio & Syer 1998). Indeed, this is in line with the 
same evidence of galaxies within galaxy groups or clusters. 
The life of these substructures is affected by the varius dy- 
namical effects that contribute to their disruption. The dy- 
namical friction force drives the satellites towards the center 
of mass of the system where they can merge with the central 
object or among themselves. While a satellite orbits inside 
the main halo, the tidal forces exerted by the background 
induce its evaporation and reduce its mass (Gnedin & Os- 
triker 1997; Gnedin, Hernquist & Ostriker 1999; Taylor & 
Babul 2000; Taffoni et al. 2001, Taffoni et. al 2001 in prep). 

The evolution of substructure is one of the crucial points 
to modeling galaxy formation. A most important aspect of 
this is the prediction of the initial orbital parameters, i.e. 
the energy and the angular momentum of the orbit of the 
satellites infalling into the main halo. The large-scale sim- 
ulations as those we use in the present paper lack enough 
resolution to address such processes. High-resolution simu- 
lation are necessary to describe the evolution of satellites 
(Tormen 1997; Ghigna et al. 1998), at the cost of simulating 
one cluster at a time. It is then useful to resort again to 
analytic modeling of the dynamical friction and tidal strip- 
ping (Chandrasekhar 1943; see e.g., Binney & Tremain 1987, 
Lacey & Cole 1993; van den Bosch et al. 1999, Colpi, Mayer 

6 Governato 2000). These analytical models require knowl- 
edge of the initial orbital parameters. In the semi-analytical, 
EPS-based codes for galaxy formation, these parameters are 
in general Monte-Carlo extracted from some distributions 
obtained from high-resolution simulations (Tormen 1997; 
Ghigna et al. 1998). 

Within the PINOCCHIO code, it is possible to predict 
the impact parameters of the merging satellites, as the in- 
fall velocities and the relative distances are known. Notice 
that this calculation is analogous to that of angular mo- 
mentum of haloes presented in paper II. Given the impact 
(Zel'dovich) velocity Av and the relative distance Ar the 
angular momentum and the energy are computed as: 



J = Ar A Av 



E = 



(Av) 2 + 0(|Ar|) . 



(5) 
(6) 



0(| Ar|) can be evaluated as the gravitational potential of a 
point mass which touches the external layer of a spherical 
halo of mass M: 4>{\r\) = GM/\r\. The linear growth of the 
relative velocity is stopped at a physical time equal to a half 
of the merging time (see paper II). 

To study the ability of PINOCCHIO in predicting the 
orbital parameters of DM substructures we compute the 
distribution of the orbital parameters of the satellites that 
merge with a halo of mass M = 2 x 10 14 M©. We express the 
angular momentum and the energy for unit mass in terms 



CD 




orb 



Figure 10. The distribution of the © or b for the satellites that 
merge with a halo of mass M = 2 X 10 14 Mq at z=0. 



of the circularity e = J/J C (E), where J C (E) = V c r c (E) is 
the angular momentum and r c (E) is the radius of the cir- 
cular orbit with the same energy (see eg., van den Bosch et 
al. 1999). To do that we assume a Navarro, Frank & White 
(1996) density profile for the main halo and we calculate the 
associted potential energy profile 0nfw(-R) and the associ- 
ated circular velocity profile V C (R) (see e.g. Navarro, Frank 
& White 1996; Klypin et al. 1999, Taffoni et al. in prep). We 
consider a particular combination of the orbital parameters 





f orb — C 



-IE) 



R v ii 



(7) 



introduced by Cole et. al (2001). They use the Tormen 
(1997) results to derive the distribution for the O or b fac- 
tor and they find that this distribution can be fitted with a 
log normal function with mean value (log 10 (O or b)) = —0.14 
and dispersion ((log 10 (6 or b) - (logi (©orb)}) 2 ) ' 5 = 0.26. 

The result of our analysis are presented in Fig. 10, we 
compare the distribution of the O or b factor measured in 
PINOCCHIO (histogram) with the theoretical fit derived 
by Cole et. al (2001) (solid line). We note that the distribu- 
tion measured form PINOCCHIO reproduce with good ac- 
curacy the log normal function. The average value derived 
by our analysys is (log 10 (O or b)) = —0.18 and the dispersion 

is <(i ogl0 (e or b) - <iog 10 (e or b)» 2 } - 5 = o.23. 



6 CONCLUSIONS 

We have tested the predictions of the PINOCCHIO code, 
presented in paper I and paper II, regarding the hierarchical 
nature of halo formation, with particular attention to those 
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aspects that are mostly relevant for galaxy formation. We 
have compared the results of PINOCCHIO with those of two 
large N-body simulations (ACDM and SCDM cosmologies) 
drawing the following conclusions: 

1. The merger histories of the PINOCCHIO haloes resemble 
closely those found applying the FOF algorithm to the N- 
body simulations. The agreement is valid at the statistical 
level for groups of at least 30 particles (good results are 
obtained even for haloes of 10 particles). 

2. Statistical quantities like the conditional mass function, 
the distribution of the largest progenitor, the ratio of the 
second largest to largest progenitors, and the higher mo- 
ments of the progenitor distributions are recovered with a 
typical accuracy of ~20 per cent. 

3. The agreement is good also at the object-by-object level, 
as PINOCCHIO cleanly reproduces <;70 per cent of the pro- 
genitors when parent haloes are cleanly recognised them- 
selves. The agreement slowly degrades with time. 

4. The increased noise recovered in the object-by-object 
agreement, as time progresses and non-linearity grows, does 
not influence the accuracy of the predictions in a statistical 
sense. 

5. The correlation function of higher-redshift haloes that 
are progenitors of lower-redshift massive haloes is correctly 
reproduced to within an accuracy of ~10 per cent in ro. The 
scale dependent bias of these with respect to the total halo 
population is also reproduced to within an accuracy of 20 
per cent or better. 

6. PINOCCHIO gives an estimate of the initial orbital pa- 
rameters of merging haloes as well, which is found in reason- 
able agreement with available results from high-resolution 
N-body simulations. 

Compared with the widely used EPS approach (Bond 
et al. 1991; Bower 1991; Lacey & Cole 1993), PINOCCHIO 
improves in most respects. 

1. The fit of the statistical quantities achieved by PINOC- 
CHIO is much better than the PS and EPS estimates, which 
show discrepancies up to a factor of 2 (see Governato et al. 
2000; Jenkins et al. 2001; Bode et al. 2001; Somerville et al. 
1999; Sheth & Lemson 1999b, Bagla et al. 199?). 

2. The validity of PINOCCHIO extends to the object-by- 
object level, in contrast to EPS (Bond et al. 1991; White 
1996). 

3. PINOCCHIO is not affected by the inconsistencies of 
the EPS approach, that can be corrected only by means 
of heuristical recipes (Somerville & Kolatt 1999; Sheth & 
Lemson 1999; Cole et al. 2001). 

4. PINOCCHIO provides much more useful information on 
the haloes, such as positions, velocities and angular mo- 
menta ad initial orbital parameters at merger; at the same 
time it is not more computational demanding than an EPS 
based code to generate the merging histories of haloes. 

These results confirm the validity of PINOCCHIO as a 
fast and flexible tool to study galaxy formation or to gen- 
erate catalogues of galaxies or galaxy clusters, suitable for 
large-scale structure studies. In fact PINOCCHIO repro- 
duces, in a much quicker way and to a very good level of 
accuracy, all the information that can be obtained from a 
large-scale N-body simulation with pure dark matter, with- 
out needing all the post processing necessary to obtain the 
merger trees of haloes. 



PINOCCHIO is available at 

http://www.daut.univ.trieste.it/" pinocchio. 
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